Model Selection

Hierarchical Vision Transformer

# Hierarchical Vision Transformer

Hiera Abswin Base Mim

A Hiera image encoder employing an absolute window position embedding strategy, pre-trained via Masked Image Modeling (MIM), serving as a general-purpose feature extractor or backbone network for downstream tasks.

Image Classification

Hiera Huge 224 Hf

Hiera is an efficient hierarchical vision Transformer model that excels in image and video tasks with fast runtime

Image Classification

Transformers English

Hiera Large 224 Hf

Hiera is a hierarchical vision Transformer model that is fast, powerful, and concise, surpassing existing technologies in image and video tasks while being faster.

Image Classification

Transformers English

Hiera Base Plus 224 Hf

Hiera is a hierarchical vision Transformer model that is fast, powerful, and concise, surpassing state-of-the-art performance in a wide range of image and video tasks while significantly improving runtime speed.

Image Classification

Transformers English

Hiera Base 224 Hf

Hiera is a hierarchical vision Transformer model that is fast, powerful, and concise, excelling in image and video tasks.

Image Classification

Transformers English

Hiera Base 224 In1k Hf

Hiera is a hierarchical vision Transformer model that is fast, powerful, and concise. It surpasses state-of-the-art performance in a wide range of image and video tasks while significantly improving runtime speed.

Image Classification

Transformers English

Hiera Small 224 Hf

Hiera is a hierarchical vision Transformer model that combines speed, powerful functionality, and minimalist design, significantly surpassing existing technical standards in image and video tasks with outstanding computational efficiency.

Image Classification

Transformers English

Hiera Tiny 224 Hf

Hiera is a hierarchical vision Transformer model that is fast, powerful, and extremely concise. It surpasses current state-of-the-art techniques in a wide range of image and video tasks while achieving significant speed improvements.

Image Classification

Transformers English

Upernet Swin Large

UperNet is a framework for semantic segmentation, combining the Swin Transformer backbone to achieve pixel-level scene understanding

Image Segmentation

Transformers English

Nat Small In1k 224

NAT-Small is a hierarchical vision transformer based on neighborhood attention, designed for image classification tasks.

Image Classification

Transformers Other

Dinat Mini In1k 224

DiNAT-Mini is a hierarchical vision Transformer model based on neighborhood attention mechanism, specifically designed for image classification tasks.

Image Classification

Swinv2 Large Patch4 Window12to24 192to384 22kto1k Ft

Swin Transformer v2 is a vision Transformer model pre-trained on ImageNet-21k and fine-tuned on ImageNet-1k at 384x384 resolution, featuring hierarchical feature maps and local window self-attention mechanisms.

Image Classification

Swinv2 Large Patch4 Window12to16 192to256 22kto1k Ft

Swin Transformer v2 is a vision Transformer model that achieves efficient image classification and dense recognition tasks through hierarchical feature maps and local window self-attention mechanisms.

Image Classification

Swinv2 Base Patch4 Window12to24 192to384 22kto1k Ft

Swin Transformer v2 is a vision transformer model that achieves efficient image classification and dense recognition tasks through hierarchical feature maps and local window self-attention mechanisms.

Image Classification

Swinv2 Base Patch4 Window12to16 192to256 22kto1k Ft

Swin Transformer v2 is a vision Transformer model that achieves efficient image classification through hierarchical feature maps and local window-based self-attention mechanisms.

Image Classification

Swinv2 Base Patch4 Window12 192 22k

Swin Transformer v2 is a vision Transformer model that achieves efficient image processing through hierarchical feature maps and local window self-attention mechanisms.

Image Classification

Swinv2 Base Patch4 Window16 256

Swin Transformer v2 is a vision Transformer model that achieves efficient image classification and dense recognition tasks through hierarchical feature maps and local window self-attention mechanisms.

Image Classification

Swinv2 Base Patch4 Window8 256

Swin Transformer v2 is a vision Transformer model that achieves efficient image classification and dense recognition tasks through hierarchical feature maps and local window self-attention mechanisms.

Image Classification

Swinv2 Small Patch4 Window16 256

Swin Transformer v2 is a vision Transformer model that achieves efficient image processing through hierarchical feature maps and local window self-attention mechanisms.

Image Classification

Swinv2 Tiny Patch4 Window16 256

Swin Transformer v2 is a vision Transformer model that achieves efficient image classification through hierarchical feature maps and local window self-attention mechanisms.

Image Classification

Swinv2 Tiny Patch4 Window8 256

Swin Transformer v2 is a vision Transformer model pre-trained on ImageNet-1k, featuring hierarchical feature maps and local window self-attention mechanisms with linear computational complexity.

Image Classification

Swin Large Patch4 Window12 384

Swin Transformer is a hierarchical vision Transformer model based on shifted windows, specifically designed for image classification tasks.

Image Classification

Swin Base Patch4 Window7 224 In22k

Swin Transformer is a hierarchical window-based vision Transformer model pretrained on the ImageNet-21k dataset, suitable for image classification tasks.

Image Classification

Swin Large Patch4 Window12 384 In22k

Swin Transformer is a hierarchical window-based vision Transformer model, pretrained on the ImageNet-21k dataset, suitable for image classification tasks.

Image Classification

Swin Base Patch4 Window12 384 In22k

Swin Transformer is a hierarchical vision Transformer based on shifted windows, specifically designed for image classification tasks.

Image Classification

Swin Small Patch4 Window7 224

Swin Transformer is a hierarchical window-based vision Transformer model designed for image classification tasks, with computational complexity linearly related to input image size.

Image Classification

Swin Tiny Patch4 Window7 224

Swin Transformer is a hierarchical vision Transformer that achieves linear computational complexity by computing self-attention within local windows, making it suitable for image classification tasks.

Image Classification

Swin Base Patch4 Window7 224

Swin Transformer is a hierarchical vision transformer based on shifted windows, suitable for image classification tasks.

Image Classification

Swin Large Patch4 Window7 224

Swin Transformer is a hierarchical vision Transformer that achieves linear computational complexity by computing self-attention within local windows, making it suitable for image classification and dense recognition tasks.

Image Classification

Swin Large Patch4 Window7 224 In22k

Swin Transformer is a hierarchical vision transformer based on shifted windows, pretrained on the ImageNet-21k dataset, suitable for image classification tasks.

Image Classification

Swin Base Patch4 Window12 384

Swin Transformer is a hierarchical vision transformer based on shifted windows, specifically designed for image classification tasks, with computational complexity linear to input image size.

Image Classification

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase